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Abstract. In this work we firstly review some results in Classical Information Theory. Next, we try to generalize 
these results by using the Tsallis entropy. We present a preliminary result and discuss our aims in this field. 



1 Introduction 

Information theory deals with measurement and transmis- 
sion of information through a channel. A fundamental work 
in this area is the Shannon's Information Theory (see [2|, 
Chapter 11), which provides many useful tools that are based 
on measuring information in terms of the complexity of 
structures needed to encode a given piece of information. 

Shannon's theory solves two central problems for clas- 
sical information: 

(1) How much can a message be compressed; i.e., how 
redundant is the information? (The noiseless coding theo- 
rem). 

(2) At what rate can we communicate reliably over a 
noisy channel; i.e., how much redundancy must be incor- 
porated into a message to protect against errors? (The noisy 
channel coding theorem). 

In this theory, the information and the transmission 
channel are formulated in a probabilistic point of view. In 
particular, it established firmly that the concept of informa- 
tion has to be accepted as a fundamental, logically sound 
concept, amenable to scientific scrutiny; it cannot be viewed 
as a solely anthropomorphic concept, useful in science only 
on a metaphoric level. 

If the channel is a quantum one or if the information 
to be sent has been stored in quantum states than Quan- 
tum Information Theory starts. The fact that real systems 
suffer from unwanted interactions with the outside world 
makes the systems undergo some kind of noise. It is nec- 
essary to understand and control such noise processes in 
order to build useful quantum information processing sys- 
tems. Quantum Information Theory basically deals with 
tree main topics: 

1 .Transmission of classical information over quantum 
channels 

2. Quantifying quantum entanglement for quantum in- 
formation 

3. Transmission of quantum information over quantum 
channels. 

The aim of this work is to extend The Noiseless Chan- 
nel Coding Theorem for a nonextensive entropic form due 



to Tsallis |4|. Further works in Quantum Information may 
be also provided as a consequence of this research. 

To achieve this goal, we review the development of 
Shannon's Theory presented in |2|, pp. 537. In this refer- 
ence, a central definition is an e — typical sequence. Some 
results in the Shannon's Theory can be formulated by using 
elegant properties about these sequences. 

Thus, after reviewing some results about e — typical 
sequences, we try to generalize these results by using Tsal- 
lis entropy, a kind of nonextensive entropy. That is the key 
idea of this work. 

Some results about Shannon's Theory generalizations, 
by using Tsallis entropy, can be also found in 1 5 1 . However, 
that reference follows a different approach. 

In section|2]we review the classical information theory. 
Then, in section|3] we present a preliminary result by using 
nonextensive theories. The Central Limit Theorem, used 
during the presentation that follows, is developed on section 
[5]in order to complete the material. 

2 Classical Information Theory 

Firstly, we must discuss Shannon entropy and its relevance 
to classical information. 

A message is a string of n letters chosen from an al- 
phabet of W letters: 

A = {ai,a 2 , ...,aw} 
Let us suppose a priori probability distribution p : 

p(a<) = Pi, (1) 

w 

^p{ai) = 1 

i=l 

For example, the simplest case is for a binary alphabet 
where p (1) = p and p (0) = 1 — p; 1 < p < 1. 

For n very large, the law of large numbers tells us that 
typical strings will contain (in the binary case) about n(l — 
p) 0's and about np l's. The number of distinct strings 



of this form is given by the binomial coefficient 



np 



From Stirling approximation |3 1 we know that log (n!) 
n log n—n+O (log n) . Thus, we approximate the binomial 
coefficient by (see [3 1, for more details): 



log 



np 



nH (p) , 



where: 



H(p) = -plogp- (1 -p) log(l-jj) 



(2) 



(3) 



is the entropy function (observe that log's have base 2). 

Thus, from equation (|2} we can see that the number of 
typical strings is of order 2 nH ^ . 

Furthermore, the entropy H has the following proper- 
ties: 

(a) < H (p) < 1, if 1 < p < 1 ; 

(b) H(p) = lonly if p = \. 
Thus, from property (a) we find that: 



2 uH(p) <2 n ^ p _^}_ 



(4) 



that is, we do not need a codeword for every n-letter se- 
quence, but only for the typical ones. In another way, we 
can compress the information in a shorter string. 
This can be generalized: 



takes the nR compressed bits and maps them back to a 
string of n letters. This operation is denoted by D n (C n (x) ) . 
A compression-decompression scheme is said to be reliable 
if the probability that D n (C n (x) ) = x approaches to one 
asrn oo. 

A fundamental result in this theory is the Shannon 's 
Noiseless Channel Coding Theorem: 

Suppose that {Xi} are independent and identically dis- 
tributed random variables that define an information source 
with entropy H (X) . Suppose R > H (X). Then there ex- 
ists a reliable compression scheme of rate R for the source. 
Conversely, if R < H (X) then any compression scheme 
will not be reliable. 

In |2 1, pp. 537, this theorem is demonstrated following 
the development given bellow. 

Let a particular n— message: x\, x 2 , x n . 

So, by the assumption of statistically independent ran- 
dom variables: 



P (xi, x 2 , x n ) = p (xi) • p (x 2 ) • ... • p (x„) (7) 
Thus, typically, we expect: 



P(x 1 ,x 2 ,...,x„)«^(l-p) (1 - p)r 



(8) 



So: 



2nH(X) 



U(np(x))\ 

X 

iT(X)=5>(x)(-logp(x)) 



(5) 



(6) 



where X : A — > is a random variable with probability 
distribution p (x). 

Such result points out to a compression scheme. To 
accomplishes this, we need also to formulate these results 
more precisely. This is done in the next section. 



— logP(xi,x 2 ,...,x n ) « (-log(p(x))) =H(X), 
n 

(9) 

in the sense that, for any e > and for n large enough we 
have 



H(X)-e< — logP(xi,x 2 ,...,x n ) < H{X) + e (10) 
n 

Thus: 



2.1 Compression Problem 

Firstly, we must formulate what is a compression scheme. 
Let us suppose that Xi, X2, X3, X n is a independent 
and identically distributed classical information source over 
some finite alphabet; that is, the expectations and variances 
are such that E {X x ) = E (X 2 ) = ... = E (X n ) = E (X) 

and D (Xi) = D (X 2 ) = ... = D (X n ) = D (X) , where 
X represents any of the random variables, and expression 
@ holds. 

A Compression Scheme of Rate R, denoted by C n (x), 
maps possible sequences x = (xi, x 2 , x n ) to a bit string 
of length nR. The matching decompression scheme D n 



2 -n(H(X)-e) >p( XuX2 ^^ Xn ) > 2 ~^ B W+e) (U) 

A useful equivalent reformulation of this expression 



is: 



— logP(xi,x 2 , ...,x„) - H(X) 
n 



< e. 



(12) 



A sequence that satisfies this property is called e — 

typical. 

In 1 2 1, pp. 537, the Shannon's Noiseless Channel Cod- 
ing Theorem is demonstrated using the following properties 
about e — typical sequences. 



Property 1: Fix e > 0. Then, for any 5 > 0, for suffi- 
ciently large n, the probability that a sequence is e — typical 
is at least 1 — 8. 

Demonstration: Let us consider the following defini- 
tions: 



£r = -log(p(x r )), (13) 
m = (-\og(p(x)))=H(X), 

where {£,.. r = 1, 2, n} are statistically independent and 
identically distributed random variables corresponding to a 
n-letter string [x\, x%, x n ) and p (x) is the probability 
distribution given by expression Q. 

Now, consider the following random variable: 



£*=E 



£ r — m 



E 



logp(av) 



-~logP(xi,x 2 ,...,x n ) -m. (14) 



By calling 



£ r — m 



(15) 



we can observe that £ s is a sum of random variables that 
satisfy the assumptions of the central limit theorem (section 
Thus, applying the expression (I54t we have: 



P(£ s ,n — > oo) 



1 J - [f - fiB (y r 

exp < 



(16) 



^/27rnD (y P ) [ 2nP> (y r ) 
However, it can be show that: 

£(yi) = £(y 2 ) = ... = £(y») = 0; 



D( 2/1 ) = £>(j/ 2 ) = ... = £>( 2/n ) = ^, 

where £ means any of the random variables £ r , r = 1, 2, n. 

Thus the probability distribution has expectation null 
and its variance goes to zero as n — > +oo. Thus, we can 
say that: 



P 



1 



logP(xi,a;2, ...,a; n ) - H (X) 



< e > 1 - <5, 



for sufficiently large n, which demonstrates the property. 



Property 2: For any fixed e > and 5 > 0, for suf- 
ficiently large n, the number \T (n, e)| of e — typical se- 
quences satisfies: 

(1 - <5) 2"( ff ( x )- £ ) < |T (n, e)| < 2 »( ff «+ £ ). 

Property 3: Let S (n) be a collection with at most 2 nR 
sequences from the source, where R < H (X) is fixed. 
Then, for any S > and for sufficiently large n, 

E M*)<<5- 

The first aim of this work is to extend the above prop- 
erties when considering the Tsallis entropy given bellow. 

3 Nonextensive Entropy and Information Theory 

Recently, Tsallis 1 4 1 has proposed the following generalized 
entopic form: 



T w v q 
Z^i=i Pi 



Sq — k- 



1 



(17) 



where A; is a constant and pi is a distribution probability: 

w 



5> = i- 



By L 'Hopitals rule, it can be shown that: 

Urn S q = Si=H (X) , 
Q->1 



(18) 



where X is the random variable such that p (X = Xi) =p%- 
Besides, through L 'Hopitals rule also, we can show 

that: 



log Pi 



lira 
g- 1 



I-P 



g-i 



(19) 



So, let us demonstrate the following property, similar 
to property 1, but now considering the entropy given by 
expression dl7> : 

Property 4: Fix e > 0. Then, for any S > 0, for suffi- 
ciently large n, we can show that: 



P 



I n 

II t-^t 



1-p? 



8-1 



9-1 



< e \> 1 



Dem: By using equation dl9l we can rewrite the left- 
hand side of expression 11121 as: 



logP {X1,X2, ...,x n ) - H (X) 



--logP(xi,x 2 , ...,x n ) - Si 
n 



lim 
g->l 



1 " 



1-P? 



.9-1 



3-1 



i=l \ y 



where we have set k = 1. 

Let us define the following random variables: 



6 = 



1-P 



.9-1 



9-1 



2 = 1, .., 77. 



Thus: 



W / q -l' 



9-1 
1 - Ei=i 



3-1 



Sq 



(21) 



(22) 



(23) 



(24) 



Thus, the radon variables £j, i = 1, ...,W, are such 
that:f7(6) = S g ,t = l,...,W: 

Once £i are independent and identically distributed, 
we can apply the Central Limit Theorem (section [5}, fol- 
lowing the same development presented on Property 1 . So, 
let us define: 



Si Sq 



i = 1, .., n 



(25) 
(26) 



i=i 



We shall observe that: 



P (6 = l, n) 



2tt 



D(0 



: exp 



-[i -of 

o-P(g) 



(29) 



So, when n — > oo the random variable £s tends to a 
gaussian distributed random variable with zero mean and 



(20) 

quence of this result. 



0. Property 4 is a straightforward conse- 



4 Discussion 

The last section presents an extenction of Property 1 when 
using Tsallis entropy. We believe that the same can be done 
for Properties 2 and 3. 

Tsallis entropy has been also explored in the field of 
Quantum Information. Thus, an important step would be 
to explore nonextensive quantum approaches for Quantum 
Information Theory. 

The work presented in 1 5 1 shows also some results in 
information theory by using Tsallis entropy. We must com- 
pare our apprach with that one present in |5 1. 

5 Central Limit Theorem 

The following development can be found in more details in 

Let us take a random variable £ : S — > 5ft, where S is 
a sample space. The probability of £ assume a value x is 
given by P (£ = x) or P (x). Thus, 



or: 



(30) 



(31) 



s, 



From expression \25\ . we observe that: 



(27) 



(28) 



D (7,0 = D (^—^) = \D & - S q ) = - 2 B (d) ■ 

But, D(a) = £>(6) = - = D(C n ) = 
Henceforth £> ( yi ) = £> (y 2 ) = ... = £> (y„) = U (y) . 

Thus, we can apply the Central Limit Theorem, and by 
using expression (I54> to obtain: 



5.1 Expectation, Variance and Characteristic Function 

The expectation or average and the variance of a random 
variable £ are written by E (£) = (£) and P (£) respec- 
tively. They are defined by: 



(32) 



D (0 = ((£ - (£)) 2 ) = ^ (x - <x» 2 P (x) . (33) 

a: 

The characteristic function <p (a) of the probability is 
defined by the Fourier Transform of P (x): 



cf){a)=E (e w? ) = ^ p 0) e ' ax ■ (34) 
Inverting f34t gives: 



P 



7T 

(*) = | 0(a) 



e~ Mx da. 



From | |34^ we can show that: 
'dlri( 



S(0 = -i 



0(0 = - 



da 



d 2 
da 2 



a=0 



a=0 



(35) 



(36) 



(37) 



From these expressions and the definition i34i we can 
obtain the following series expansion for In <j> (a) : 



So, the characteristic function of two statistically inde- 
pendent random variables is equal to the product of individ- 
ual characteristic functions. This result can be generalized 
for n statistically independent random variables by: 



' ( a ) = IJ <Pr (a) , Cn = ^2 ^ r 



(44) 



r=l 



5.3 Central Limit Theorem 

This theorem addresses a sum of n statistically independent 
random variables {£ r }, which all have the same probability 
distribution, P (£ r ) , the same expectation E (£ r ) and the 
same variance D (£ r ) < oo for all r. 

The Central Limit Theorem addresses the asymptotic 
behavior n — > +oo of the sum £ n given by expression J44i . 
From this expression and results <I36I>.<I3'7I we can show that 
□ 



In 0(a) = Q + iE(£)a- -£>(£) a 2 + ... (38) 



5.2 Sum of Random Variables 

Let S, 1 ,^ 2 two random variables defined on the same sample 
space S 1 1 1 : 



Let the sum £ = + £ 2 defined like follows: 

£ : S x S -» 3?; 

C(a,6) = e 1 (a)+£ 2 (6). 



(39) 



(40) 



We can see that £ is a random variable. 
Let us obtain the probability P = x). From the def- 
inition of £ we have: 

£ = x <^ S, 1 = k, £ 2 = x - k. 
It follows that: 



P{£ = x) = Pjoint (e 1 = k, = x - k) , (41) 

where Pjoint is the yoi/tf probability distribution. If f 1 ,^ 2 
are statistically independent, then: 

and so the characteristic function of £ is given by Q: 

<f)(a) = E (e ia ^ ) £ (e ta ^ ) . (43) 



£>(£n) 



dlnij 
da 



d 2 ln<; 
da 2 



= J2 E &), (45) 

a=0 j = i 



a=0 



(46) 



From the initial assumptions for E (£ r ) and D (£ r ), we 
can rewrite these expressions by: 



D (£„) = (£ r ) , 



(47) 



(48) 



where £ r denotes any of the random variables considered. 
From the definition of expression d35l > we can write: 



P(£ n = l) = P {I, n)= I 4> (a) e~ lal da. (49) 



By using the expansion ( T38t . the preceding integral be- 
comes: 



7T 

P ( l > n ) = ^ J ex P ( iE (£n) « - 7p &») « 2 + •••) e- Mi da. 

— 7T 

(50) 

Under the assumptions that, as n — > oo both 
£7 (£ n ) , Z? (£„) < oo, only the a m values contribute to 
the integral J50I . In this event, the limits of integration may 
be replaced by (— oo, +oo) without incurring gross error. 
Changing variables by: 



u = i (I — E n ) 
allows J50i to be rewritten: 



[5] T. Yamano. Information theory based on non- 
(51) additive information content. Technical report, 

http://www. arxiv.org/pdf/cond-mat/OO 1 0074, 200 1 . 



P(l,n) = — 



D 



D 



D 



da = 



-e" 2 / M t /4 [ e- x2 d\=^=e u2 ' 2D . (52) 



1 

2tt' 



2 
D 



where A is a dummy variable. Thus, we obtain the key re- 
sult of the central limit theorem: 



P(l,n) 



■ exp ■ 



(53) 



y/2nD (60 ^ { 2D (^) J 
By using expressions J47t - d47i this result can be rewrit 



ten: 



P(l,n)= - 1 expH-^j, (54) 
where ^ represents any of the random variables {6} . 



6 Conclusions 

This work reports a preliminary result in the extention of 
Shannon's Information Theory to nonextensive appraches. 
The Tsallis entropy was considered to accomplish this goal. 

Our main aim is to extend the Noiseless Channel Cod- 
ing Theorem by using Tsallis entropy. Then, we aim to 
explore nonextensive quantum information methods. 
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